Ruby Exercise - Lecture 1
  • Write the 2 Ruby programs described below.
  • Hand in just the Ruby code, not results or anything (email to: andrewj@bcm.edu).
  • I will run your program on the data file.
  • Comment your code in your own words using # comments.
  • Due Date: Jan 29, 2010 (but I don't recommend waiting, there will be exercises for other labs in just this course)

 
Exercise 1 - Read/Write a File, computing average expression scores
  • Download and uncompress this gene expression data file.
  • The file is a text table whose columns are tab-delimited. The third column contains gene expression scores across a panel of tissues in comma-separated value format. [ File format description ]
  • Write a Ruby program that:
    • Reads each line and, for that gene, computes the AVERAGE expression score for the first 10 tissues.
    • For each gene, output to a file:
      • The gene name and the average expression score for the first 10 tissues, as a tab-delimited table.

 
Exercise 2 - Subset results for Genes of Interest
  • Say we are only interested in the average expression score for a class of novel genes that encode for large proteins; the so called KIAA genes.
  • But the gene names in the affy expression data file are some UCSC-specific IDs.
  • Write a Ruby program that uses an alias file to filter your two-column results file from Exercise #1.
  • Only report average expression data for genes having an alias that looks like "KIAA" followed by one or more digits. Use a _regular expression_ to find such aliases.
  • The alias file [ file format description ]
    • The alias file is also a tab-delimited text file.
    • The first column is the UCSC-specific gene ID and the second column is an alias for that gene
    • There can be muliple aliases for a given gene. You want to find ones that look like KIAA followed by one or more digits, using a regular expression.
  • For each record in your output for Exercise #1, only output it if it has a "KIAA" type alias.